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Abstract 

Background: The stability of long intergenic non-coding RNAs (lincRNAs) that possess tissue/cell-specific expression, 
might be closely related to their physiological functions. However, the mechanism associated with stability of lincRNA 
remains elusive. In this study, we try to study the stability of lincRNA in K562 cells, an important model cell, through 
comparing two K562 transcriptomes which are obtained from ENCODE Consortium and our sequenced RNA-Seq 
dataset (PH) respectively. 

Results: By lincRNAs analysis pipeline, 1804 high-confidence lincRNAs involving 1564 annotated lincRNAs and 240 
putative novel lincRNAs were identified in PH, and 1587 high-confidence lincRNAs including 1429 annotated lincRNAs 
and 1 58 putative novel lincRNAs in ENCODE. There are 1 009 unique lincRNAs in PH, 792 unique lincRNAs were 
in ENCODE, and 795 overlapping lincRNAs in both datasets. The analysis of differences in minimum free energy 
distribution and lincRNA half-life showed that a large proportion of overlapping lincRNAs were more stable than 
the unique lincRNAs. Most lincRNAs were more unstable than protein-coding RNAs through comparing their 
minimum free energy. 

Conclusions: Identification of overlapping and unique lincRNAs can be helpful to classify the stability of 
lincRNAs. Our results suggest that overlapping lincRNAs (relatively stable linRNAs) and unique lincRNAs 
(relatively unstable lincRNAs) might be involved in different cellular processes. 

Reviewers: This article has been reviewed by Prof. Oliviero Carugo, Dr. Alistair Forrest and Prof. Manju Bansal. 
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Background 

The mammalian genome is extensively transcribed, giving 
rise to many thousands of non-coding transcripts including 
both short transcripts (<200 nucleotides in length) and 
long non-coding RNAs (IncRNAs) (>200 nucleotides in 
length) [1]. LncRNAs recently have caused more attention 
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because they interact with other biological molecules to 
regulate diverse cellular processes [2], According to the 
location and context in genome, IncRNAs can be classified 
into intergenic IncRNAs, intronic IncRNAs, sense IncRNAs 
and antisense IncRNAs [3]. Present studies are mainly 
focused on intergenic IncRNAs (lincRNAs) located and 
transcribed from intergenic genomic regions, not only 
because lincRNAs are more convenient for experimental 
manipulation and computational analysis without the 
interference of annotated protein-coding regions than 
other IncRNAs [4], but also because lincRNAs partici- 
pate in many cellular processes from embryonic stem 
cell pluripotency to cell proliferation and cancer pro- 
gression [5,6]. 
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With large-scale transcriptome sequencing, a growing 
number of lincRNAs have been identified in mammals [4]. 
However, only a few of lincRNAs have been functionally 
characterized. The lincRNA HOTAIRM1 modulates the 
expression of several HOXA genes, which encode key 
transcription factors for differentiation of myeloblasts [7]. 
LincRNA-p21 plays a functional role in p53 response 
pathways through affecting the expression of numerous 
genes associating with p53, and recruits ribonucleoprotein 
K (hnRNP-K) to trigger the regulation of p53-mediated 
apoptosis [8]. LincRNA-EPS at the nuclear localization is 
up-regulated in murine erythroid terminal differentiation, 
which promotes survival of murine erythroblasts [9]. 
Overall, many lincRNAs can regulate gene expression 
through specific interactions with other cellular factors, 
DNA, and other RNA molecules, such as acting in cis on 
neighboring genes or acting in tram regardless of gene 
location [10,11]. Another proposal is that IncRNA genes 
(including lincRNA) act as enhancer regions and may be 
incidental by-products of transcription [12]. 

Previous studies have indicated that RNA stability 
influences the abundance of transcript and shapes the 
kinetics of gene induction in intricate gene networks in 
mammalian cells [13-16]. Therefore, lincRNA stability 
inevitably affects its function in post-transcriptional 
regulatory pathways. However, there have been only a 
few reports about lincRNAs stabilities [7-9,17,18]. Re- 
cently, Tani et al. determined the half-lives of transcribed 
RNAs in whole-genome of HeLa Tet-off (TO) cells by 
BRIC-Seq and found hundreds of short-lived noncoding 
transcripts (tl/2 < 4h) [19,20]. Clark et al. found that only 
a minority of IncRNAs were unstable by confirming the 
half-lives (half-life < 2h) of about 800 IncRNAs in the 
mouse Neuro-2a cell line with a custom non-coding RNA 
array [21]. These studies concluded that mRNAs and 
IncRNAs (including lincRNAs) have similar half-life distri- 
butions [11]. Furthermore, lincRNAs are transcriptionally 
activated similar to mRNAs, nearly always 5'-capped and 
3'-polyadenylated, and are frequently spliced [3,11]. It has 
been known that short-lived mRNAs were enriched 
among genes with regulatory functions, whereas long- 
lived mRNAs with metabolism and structure [22,23]. Like 
that, lincRNA stability should be closely associated with 
its physiological function. Although different inhibitor ap- 
proaches for identifying half-life have been documented in 
recent years, the stability of lincRNA is poorly understood. 
It limited functional research of lincRNAs in the complex 
post-transcriptional regulation. 

The human leukemia K562 cells have the potentials 
for differentiating into erythroid, granulocytic, monocytic 
and megakaryocytic lineages [24] . It is an important model 
cell in studying the pluripotency and differentiation of 
hematopoiesis and leukemogenesis [25,26]. Hence, we 
investigated the stability of lincRNAs in K562 cells and 



comparatively analyzed lincRNAs in ENCODE and our 
dataset named as PH which was sequenced on an Illu- 
mina HiSeq™ 2000 with pair-end libraries by RNA-Seq 
technology. The goals were to (1) improve the lincRNAs 
analysis workflow to acquire stringent lincRNA set; (2) 
compare and analyze the lincRNAs in ENCODE and PH 
datasets; (3) compare the stabilities of coding-protein 
RNAs and lincRNAs; (4) display lincRNAs distribution in 
human genome. The results suggest that lincRNAs with 
different stability may have diverse functions in various 
biological processes. 

Results 

Pipeline for lincRNA analysis 

In order to attain high-confidence lincRNA catalog, there 
are two main different improvements comparing to previ- 
ous lincRNA predicting pipeline. (1) The more rigorous set 
of intergenic transcripts was acquired from the intersection 
of intergenic transcripts by comparing with four database 
annotations (Ensembl [27], UCSC [28], Gencode [29] and 
Refseq [30]), respectively. (2) Putative novel lincRNAs were 
more strictly attained by filtering coding potential RNAs of 
the results including iseeRNA [31], CPAT [32], CPC [33] 
and PhyloCSF [34] (see Methods). Through the lincRNA 
predicting pipeline, unreliable transcripts were removed 
and reliable lincRNAs were remained in downstream 
analysis. 

The features of lincRNAs in PH and ENCODE 

In PH, 1804 lincRNAs were identified, 1564 of which 
(86.7%) were annotated with the Ensembl or Gencode, 
and the remaining 240 (13.3%) were putative novel 
lincRNAs. There were 312 lincRNAs (17.3%) with 
FPKM (Fragments Per Kilobase of exon model per 
Million mapped fragments) > 1, including 159 annotated 
lincRNAs and 153 putative novel lincRNAs. Simultan- 
eously, 1587 lincRNAs were identified in ENCODE, 1429 
of which (90.0%) were annotated with the Ensembl or 
Gencode, and the remaining 158 (10.0%) were putative 
novel lincRNAs. There were 514 lincRNAs (32.4%) with 
FPKM > 1, including 455 annotated lincRNAs and 59 
putative novel lincRNAs. LincRNAs in K562 cells, 
whether they were annotated or not in both datasets, 
were uniformly and prevalently distributed at every 
chromosome in human genome, even if most of the 
transcripts were low-abundant (FPKM <1) (Figure 1). 
By comparing lincRNAs of both datasets, 1009 lincR- 
NAs were unique in PH, 792 lincRNAs were unique in 
ENCODE (unique lincRNAs present in only PH or EN- 
CODE dataset), and 795 lincRNAs were overlapping in 
both datasets (overlapping lincRNAs present in both 
ENCODE and PH datasets) (Figure 2, Additional file 1: 
Table SI, Additional file 2: Table S2, Additional file 3: 
Table S3, Additional file 4: Table S4). 
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Figure 1 Distribution of the lincRNAs of K562 cells in human genome. The outside green circle is IncRNA location of Gencode v18 
annotation. The three circles in close proximity to genome are features of the ENCODE lincRNAs (putative novel lincRNAs, black; annotated 
lincRNAs, orange; the FPKM value (FPKM < SO) for histogram). The inner three circles are features of PH lincRNAs (putative novel lincRNAs, black; 
annotated lincRNAs, orange; the FPKM value (FPKM < 50) for histogram). The centric circle is the distribution of the overlapping lincRNAs of 
ENCODE and PH (khaki). 



ENCODE PH 




Figure 2 Venn diagram of lincRNAs between ENCODE and PH 

645 unique annotated lincRNAs and 147 unique putative novel 
lincRNAs of ENCODE display at the left; 780 unique annotated 
lincRNAs and 229 unique putative novel lincRNAs of PH appear 
at the right; 784 overlapping annotated lincRNAs and 1 1 
overlapping putative novel lincRNAs of both PH and ENCODE 
display at the middle. 



Comparison of minimum free energy and lincRNAs 
half-lives between PH and ENCODE 

Whether lincRNAs expressed with FPKM > 1 or FPKM < 1, 
the minimum free energy of most overlapping lincRNAs 
were higher than unique lincRNAs (Figure 3). Although the 
expressed abundances of the majority of overlapping lincR- 
NAs were higher than unique lincRNAs in the cases of 
FPKM < 1 (Figure 4), the minimum free energy and expres- 
sion level (FPKM) are not correlated (Pearson's correlation 
coefficient < 0.2 in both datasets). Comparing the deter- 
mined ncRNAs half-lives in Hela cells [19], 12 overlapping 
lincRNAs were found with half-life > 4 in both datasets, 2 
unique lincRNAs in PH and 3 unique lincRNAs in EN- 
CODE. In this study, random 10 overlapping lincRNAs 
half-lives (including FPKM > 1 and FPKM < 1, Figure 5) 
and random 9 unique lincRNAs half-lives (including 
FPKM > 1 and FPKM < 1 from PH or ENCODE, Figure 6) 
were determined by qPCR after ActD treatment [21]. 
The results proved that 10 overlapping lincRNAs 
half-lives are more than 4 hours (3 of which more 
than 30 hours), while 7 unique lincRNAs half-lives are less 
than 4 hours. 
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Figure 3 Comparison of the minimum free energy between the unique and overlapping lincRNAs in both datasets. Box-and-whisker plot. 
(Whiskers) 1 st— 99th percentile, without individual lincRNAs outside this. (Box) 25th-75th percentile. Difference calculated using a nonparametric 
Mann-Whitney t-test [(***) P < 0.000001], (A) Comparison of the minimum free energy of the unique and overlapping lincRNAs with FPKM < 1. 
(B) Comparison of the minimum free energy of the unique and overlapping lincRNAs with FPKM > 1. (PH, the unique lincRNAs in PH; ENCODE, 
the unique lincRNAs in ENCODE; COM, the overlapping lincRNAs of both datasets). 



Comparison of the minimum free energy of the common 
protein-coding RNAs and lincRNAs 

2914 protein-coding RNAs for the intersection of Cuff- 
compare's results with four public database annotations 
(Ensembl, UCSC, GENCODE and Refseq) were identified 
in PH, 2546 (87.4%) of which presented in ENCODE. 
Likewise, 795 lincRNAs (44.1%) in PH appeared in both 
datasets (Figure 7). By comparing the minimum free en- 
ergy of the common protein-coding RNAs and lincRNAs 
of the both datasets, the result showed that the minimum 
free energy of a large proportion of protein-coding RNAs 
was significantly higher than lincRNAs (Figure 8). 

Discussion 

In this study, the lincRNA analysis workflow was improved 
to acquire stringent lincRNA set. The method to acquire 
the intersection of intergenic transcripts is simpler and 
more convenient than integrating several database annota- 
tions by their scripts [4,35]. Moreover, possible novel lincR- 
NAs with uncertain coding potential (TUCPs) which 



possibly possess protein-coding or small peptides potential, 
were filtered by four softwares (iseeRNA, CPAT, CPC and 
PhyloCSF), respectively. Thereby more reliable putative 
novel lincRNAs were attained. 

The genome encodes far more lincRNAs than previ- 
ously known, and pervasively transcriptional lincRNAs 
might play widespread roles in gene regulation and other 
cellular processes [36]. In the study, thousands of lincR- 
NAs including annotated lincRNAs and putative novel 
lincRNAs were identified in both datasets through our 
lincRNA predicting pipeline. And, those lincRNAs were 
transcribed from thousands of locations at every chromo- 
some in human genome. 

The sequence of nucleotides of an RNA molecule 
carries the functional information. As more lincRNAs 
are identified, it will become important to study lincRNA 
sequences and their secondary structures to reveal 
mechanisms of lincRNA functions through establish- 
ing structure-function relations [11,37-40]. These roles 
including sensory, guiding, scaffolding and allosteric 
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Figure 4 Comparison of expression level between the unique and overlapping lincRNAs in both datasets. Box-and-whisker plot. 
(Whiskers) 1 st— 99th percentile, without individual lincRNAs outside this. (Box) 25th-75th percentile. Difference calculated using a nonparametric 
Mann-Whitney t-test [(***) P < 0.000001], (A) Comparison of expressed abundance of the unique and overlapping lincRNAs with FPKM < 1. (B) 
Comparison of expressed abundance of the unique and overlapping lincRNAs with FPKM > 1. (PH, the unique lincRNAs in PH; ENCODE, the 
unique lincRNAs in ENCODE; PH_COM, the overlapping lincRNAs in PH; ENCODE_COM, the overlapping lincRNAs in ENCODE). 
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Figure 5 Stabilities of overlapping lincRNAs in both datasets. LincRNA decay curves after blocking transcription in K562 with actinomycin D 
and measuring relative levels of lincRNA remaining relative to a control gene (GAPDH) by qPCR. The relative quantitative values at time 0 h were 
arbitrarily adjusted to 100%. The fitted curve was modeled using locally weighted polynomial regression (LOESS method). Results are from three 
biological replicates, and Error bars show standard deviation [(**) P < 0.01; (*) P <0.05, Student's t-test]. 



capacities derive from folded modular domains in 
lincRNAs [41]. Minimum free energy secondary structure, 
which is predicted with minimizing the free energy of a 
conformation according to a thermodynamic mode, is 
functionally important [42]. The lower the value of mini- 
mum free energy is, the more energy unfolding RNA re- 
quires, so that RNA can be unfolded to be bound by some 
factors (including miRNA) to trigger degradation processes. 
Hence, minimum free energy can partly reflect lincRNAs 
stability. Previous studies showed that protein-coding 
RNAs are more stable than IncRNAs (including lincR- 
NAs) [43], which agree with our result that the secondary 
structures of a large proportion of protein-coding RNAs 
are more stable than lincRNAs by the analysis of mini- 
mum free energy. 

The large variation in IncRNA stability is associated with 
their functional diversity [4,25,44]. Furthermore, IncRNAs 
can cluster into the same decay profiles which may be 
regulated by the same post-transcriptional regulatory 
pathways and/or contain similar regulatory sequences 
[21]. Previous studies propose that IncRNAs with more 
structural elements are more stable [21]. Through the 
analysis of Gene Ontology (GO) terms with differences 
in mRNA half-life distribution, Tani et al. found that 
short-lived mRNAs {ti/ 2 < 4 h) were significantly enriched 



in implicated regulatory functions, and long-lived mRNAs 
(ti/2 > 4 h) were disproportionately represented implicated 
housekeeping functions [19]. Furthermore, some lincR- 
NAs with short half-lives (ti/ 2 < 4 h) were well-known 
regulatory lincRNAs, and several ncRNAs with long half- 
lives (ti/2 > 4 h) were involved in housekeeping functions 
[19], as is the case with mRNAs. The median IncRNA 
(including lincRNA) half-life is 3.5 h (mean 4.8 h) [21], so 
unstable lincRNA is determined with half-life < 4 h, and 
stable lincRNA is determined with half-life > 4 h. 

Although the functions of a large proportion of both 
overlapping and unique lincRNAs in PH and ENCODE 
datasets were mysterious, we suggested their functions 
by correlating the stability of lincRNA to the functional 
categories. In this study, most overlapping lincRNAs 
were more stable than unique lincRNAs regardless of 
FPKM by the analysis of differences in minimum free en- 
ergy distribution and lincRNA half-lives. Moreover, there 
was no correlation between IncRNA (including lincRNA) 
expression and half-life [21], similar to the result that 
lincRNA expression and minimum free energy were not 
correlated. Highly stable mRNAs often encode highly 
stable proteins with "housekeeping" functions [45]. Simi- 
larly, relatively stable lincRNAs with long half-lives may 
serve "housekeeping" roles. Stable lincRNAs may avoid 
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Figure 6 Stabilities of unique lincRNAs in PH or ENCODE. LincRNA decay curves after blocking transcription in K562 with actinomycin D and 
measuring relative levels of lincRNA remaining relative to a control gene (GAPDH) by qPCR. The relative quantitative values at time 0 h were 
arbitrarily adjusted to 100%. The fitted curve was modeled using locally weighted polynomial regression (LOESS method). Results are from three 
biological replicates, and Error bars show standard deviation [(**) P < 0.01; (*) P <0.05, Student's t-test]. 
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Figure 7 The number of the protein-coding RNAs and lincRNAs 

in PH. 2914 protein-coding RNAs exist in PH, 2546 (87.4%) of which 
present in both datasets. 1804 lincRNAs exist in PH, 795 (44.1%) 
appear in both datasets. 
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Figure 8 Comparison of the minimum free energy between the 
common protein-coding RNAs and lincRNAs. Box-and-whisker 
plot. (Whiskers) 1 st— 99th percentile, without individual transcripts 
outside this. (Box) 25th-75th percentile. Difference calculated using 
a nonparametric Mann-Whitney t-test [(***) P < 0.000001]. 
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degradation for undertaking some functions through 
various mechanisms, such as the stable secondary struc- 
ture and interactions with RNA-binding proteins. The 
overlapping lincRNA XIST (half-life ~ 7 h) mediates 
X-chromosome inactivation in cis by recruiting a 
chromatin-modifying complex to specific sites [46] . It is a 
potent suppressor of hematologic cancer in mice, which 
may account for that human malignancies sometimes 
show X chromosome aneuploidies [17]. Additionally, 
MALAT-1, intergenic ~7 kb single exon transcript, has 
a highly conserved tRNA-like sequence at the 3' end 
and process to generated a short tRNA-like ncRNA 
mascRNA [47]. It is stable in human B cells and Hela 
(half-life > 7 h) and has been found to regulate alternative 
splicing of endogenous target genes in the light of the 
information of lncrnadb [48] . 

On the other hand, relatively unstable lincRNAs with 
short half-lives may be expressed in narrow time win- 
dows in response to external stimuli, which is vital for 
reflecting regulatory functions [19,21]. Low stabilities are 
characterized in many transcription factor mRNAs, and 
transcription factors can regulate gene expression in 
response to environmental signals through activating or 
repressing target genes [23,49]. Unstable lincRNAs such 
as H19 [18], HOTAIR [50] and GAS5 [51], could act al- 
most immediately after transcription without producing 
a functional gene product in the nucleus, so they might 
not require too long half-lives. Furthermore, unstable 
RNAs (including lincRNAs) associating with chromatin 
binding proteins suggested unstable lincRNA would be 
suitable for regulating gene expression [10,21]. The unique 
lincRNA H19 (half-life ~ 1 h) is abundantly expressed dur- 
ing embryonic development and down-regulated after birth 
and can regulate processing of miR-675 [18,52]. Addition- 
ally, HOTAIR with a half-life < 4 h in human Hela cells 
[19], has active role in modulating the cancer epigenome by 
binding to the polycomb repressive complex 2 (PRC2) [50]. 
GAS5 with a half-life < 4 h in human Hela cells [19], is a 
ribo-repressor by binding DNA domain of the gluco- 
corticoid receptor to influence cell survival and metabolic 
activities during starvation [51]. 

Conclusions 

We improved lincRNAs predicting pipeline to attain 
high-confidence lincRNAs and showed that lincRNAs are 
ubiquitous transcribed in K562 cells. By comparing both 
RNA-Seq datasets of K562 cells to explore the stability of 
lincRNAs, a large proportion of overlapping lincRNAs 
were more stable than unique lincRNAs by the analysis 
of differences in minimum free energy distribution and 
lincRNA half-lives. These results implied that overlapping 
lincRNAs (relatively stable linRNAs) and unique lincRNAs 
(relatively unstable lincRNAs) could be associated with dif- 
ferent functions, which will facilitate future experimental 



and computational investigations about lincRNA for 
leukemia disease. 

Methods 

Cell culture 

K562 cell line was cultured in RPMI 1640 medium 
(GIBCO, Life Technologies) supplemented with 10% (v/v) 
fetal bovine serum, 100 units/ml Penicillin, and 100 (ig/ml 
Streptomycin (P/S) (GIBCO, Life Technologies) in incuba- 
tor (5% C0 2 , at 37°C). 

RNA extraction, illumina library construction and 
sequencing 

Total RNA from K562 cells was prepared using Trizol 
reagent (Invitrogen.USA). Subsequently, they were used 
for mRNA purification and library construction with the 
Truseq™ RNA Sample Preparation Kit v2 (Illumina, San 
Diego, CA, USA) following the manufacturer's instructions. 
Our sample was named PH that was sequenced on an 
Illumina HiSeq™ 2000 (Illumina) with pair-end libraries in 
Encode Genomics Bio-Technology Co. (Suzhou, China). 

Preprocessing RNA-Seq datasets 

Transcriptome reconstruction of PH and ENCODE data- 
sets of K562 cells by RNA-Seq was performed respectively 
using rigorous read set through a sliding window filtering 
the average quality values within the window less than 20 
and the length of reads less than 35 bp by Trimmomatic 
[53]. After quality control, we obtained 90.7 million 2*100- 
base paired-end reads generated by Illumina Hiseq2000 
sequencing on polyadenylated selected (Poly-A + ) RNAs. 
On the other hand, ENCODE RNA-Seq dataset was incor- 
porated 112.3 million 2*76-base paired-end reads gener- 
ated by Illumina GAIIx sequencing on Poly-A + RNAs 
from NCBI Gene Expression Omnibus (GEO) database 
with accession number GSM765405. 

Obtaining annotated and putative novel lincRNAs 

Quality-control reads were aligned by TopHat (v2.0.7) 
[54], and transcripts of PH and ENCODE datasets were 
reconstructed by Cufflinks (v2.0.2) [55] with the Ensembl 
annotation, respectively. Because of strand-specific of 
ENCODE by RSeQC script [56], fr-firststrand library 
type was performed for ENCODE by Cufflinks. To elimin- 
ate all annotated non-lincRNA transcripts, the intersection 
of transcripts of the V category (unknown, intergenic tran- 
script in Cufflinks) was attained using cuffcompare script 
with four public databases annotations (annotated protein- 
coding genes, microRNAs, rRNAs, tRNAs and pseudo- 
genes), including Ensembl (Homo_sapiens.GRCh37.70.gtf), 
UCSC (hgl9), Gencode (gencode.vl5.annotation.gtf.gz) and 
Refseq (ref_GRCh37.plO_top_level.gff3) respectively. That 
is, the intersection of intergenic transcripts was acquired 
apart from all annotated non-lincRNA annotations of four 
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databases. Annotated lincRNAs were acquired through 
the intersection of intergenic transcripts to run Cuffcom- 
pare script again with Gencode IncRNAs annotation 
(gencode.vl8.long_noncoding_RNAs.gtf.gz). The remaining 
transcripts were possible novel lincRNAs. Then, possible 
novel lincRNAs were filtered on the basis of some charac- 
teristics including FPKM value (Due to the low expression 
of lincRNAs, we considered to acquire more lincRNAs 
based on the density distribution of lincRNAs. ENCODE, 
FPKM > 0.1; PH, FPKM > 0.01), length ( > 200 nt), ORF 
(< 100 codons) and exonic number ( > 2). After that, 
putative novel lincRNAs were acquired based on non- 
coding potential by integrating the results of four soft- 
wares including iseeRNA (noncoding), CPAT (no), CPC 
(noncoding) and PhyloCSF (score < 100) (Figure 9). 

iseeRNA, a lightweight SVM-based program, is de- 
signed for computational identification of lincRNAs 
from high-throughput transcriptome sequencing data. 
CPAT, which overcomes several intrinsic pairwise and 
multiple alignments limitations, uses logistic regres- 
sion model based on ORF size, ORF coverage, Fickett 
TESTCODE and Hexamer bias. CPC relys on pairwise 
alignment to assess the protein-coding potential of a 
transcript based on six biologically meaningful sequence 
features. PhyloCSF uses a multi-species nucleotide se- 
quence alignment to calculate the phylogenetic conser- 
vation score, which is likely to represent a protein-coding 
region. 
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Figure 9 An overview of our pipeline for obtaining annotated 
lincRNAs and putative novel lincRNAs from RNA-Seq dataset. 

TUCPs, the transcripts of uncertain coding potential; 'u', intergenic 
transcript in Cufflinks; '=', complete match of intron chain in Cufflinks 
(see Methods). 



Minimum free energy 

Minimum free energy secondary structure is assessed by 
minimum free energy (thermodynamic free energy). RNA- 
fold predicts minimum free energy and minimum energy 
secondary structures using a loop-based energy model 
and dynamic programming algorithm [57,58]. 

Half-life verification using quantitative real-time PCR 
(qRT-PCR) 

For experiment of half-life, cells were grown to -50% 
confluency before RNA polymerase activity was blocked 
by 10ug mf 1 actinomycin D (sigma) in DMSO. Transcrip- 
tion inhibition of K562 was carried out for 30 hours, and 
cells were harvested at time 0 h, 0.5 h, 2 h, 4 h, 8 h, 24 h 
and 30 h, respectively. 

Total RNA samples at different time were prepared 
using Trizol reagent (Invitrogen, Carlsbad, CA, USA). 
One microgram of RNA was reverse transcribed into 
cDNA using Reverse Transcriptase M-MLV (TAKARA, 
Japan), and diluted by 1:5 for RT-PCR assay. The assay 
was performed by using SYBR Premix Ex Taq (TAKARA, 
Japan) on ABI 7500 Real-Time PCR System. GAPDH was 
used as an internal control. 2" AA 1 values were calculated 
for each gene to show the fold change. All qRT-PCR reac- 
tions were performed in three biological replicates. 

We determined the lincRNA half-life by calculating 
the time when the fold change of expression abundance 
reached half of the initial expression abundance (0 h 
time point). Decay profile was modeled using locally 
weighted polynomial regression (LOESS method) (www. 
r-project.org). 

Reviewers' comments 

We appreciate the reviewer's comments from Prof Oliviero 
Carugo, Dr Alistair Forrest and Prof Manju Bansal. We 
have revised the manuscript accordingly. 

Reviewer #1 (First Round): Prof Oliviero Carugo, University 
of Vienna, Austria 
Major comments 

The main assumption of this manuscript is that overlap- 
ping lincRNAs (those that are found both in the ENCODE 
dataset and in the PH dataset) have functions different 
from unique lincRNAs (those that are found only in one 
dataset). In fact, on the basis of the data shown in the 
manuscript, this is an epistemological axiom (if not a pos- 
tulate) that might be questioned. The experimental obser- 
vation that five (5) unique lincRNAs are less stable than 
seven (7) overlapping lincRNAs is in fact insufficient to 
conclude that there is a functional difference. 

Authors' response: We have revised the whole paper to 
demonstrate our ideas. We showed that a large proportion 
of overlapping lincRNAs were more stable than the unique 
lincRNAs through the analysis of differences in minimum 
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free energy distribution. Furthermore, the experiments of 
lincRNA half-lives also supply the result. Through not only 
the half-lives of overlapping and unique lincRNAs but also 
the well-studied lincRNA with known half-life and previ- 
ous studies about RNA stablility, we suggested that over- 
lapping lincRNAs (relatively stable linRNAs) and unique 
lincRNAs (relatively unstable lincRNAs) could be associ- 
ated with different functions. 

As a consequence, the entire manuscript is highly 
speculative and this must be clearly pointed out in the 
discussion. 

Authors' response: We have added some evidences to 
supply our ideas and moderated some speculations which 
were insufficient evidences in the discussion. 

In the chapter entided "Obtaining annotated and putative 
novel lincRNAs", in the "Methods" section, a more detailed 
description must be provided to allow one to reproduce the 
algorithm. If this is too long, a supplementary file may be 
submitted (with some examples). 

Authors' response: We have re-written the lincRNA 
analysis worl<flow in the "Methods" section. 

Also the introductory section should be modified. The 
first part of the "Background" section is a good introduc- 
tion about the stability of non-coding RNAs and its func- 
tional implications. Some additional emphasis should be 
put on the relationships with cancer and diseases. Other- 
wise, the mention to medicalissues is inappropriate. The 
last part of the "Background" section, which is a short 
summary of the experiments and of the results of the 
manuscript, should be slightly expanded to be more easily 
understood by the reader. In its present form, it is too 
short and it becomes, inevitably cryptic. 

Authors' response: Thanks for the comments of reviewer. 
We have re-written the introduction and revised the whole 
article. 

I cannot inspect Figure 8 (resolution is too small). 
Authors' response: We have adjusted it in the revised 
version. 

Minor comments 

Section "Background" line 43-1 am expecting that the 
Authors describe briefly which cancers and which diseases 
are related to IncRNAs. The first sentence in this section 
is otherwise too vague for a scientific publication. 

Authors' response: We have revised the introduction 
and deleted the first sentence. 

Section "Background" line 46 - Please, check if the 
expression "... that are proved the importance ..." is 
correct. Perhaps it should be "... that are proven to be 
important ..." 

Authors' response: We have revised the introduction 
and deleted the sentence. 

Section "Background" line 62 - Please, check if the 
expression "... Clark et al found only a minority ..." is 



correct. Perhaps it should be "...Clark et al found that 
only a minority 

Authors' response: Thanks, we have corrected it in the 
revised version. 

Section "Background" line 68 - Please, check if the 
expression "... we improve lincRNA ..." is correct. Perhaps 
it should be "... we improve the lincRNA ..." 

Authors' response: We have corrected it in the revised 
version. 

Section "Background" lines 68-71 - These two sentences, 
which are a sort of summary of the experiments described 
in this manuscript, are a bit confusing. First, the Authors 
write that they compared "different RNA-Seq datasets. 
Then, they write that they compared "both RNA-Seq 
datasets". The question is: are they "several" or just 
"two"? Moreover, the Authors should describe briefly 
which are these datasets, how they were assembled, vali- 
dated, compared, etc. Just few sentences should be 
enough to improve the readability of the manuscript. 

Authors' response: We revised introduction and de- 
scribed the improved lincRNAs workflow in the "Methods" 
section. 

Section "Background" lines 72-73 - The expression 
"through randomly testifying ... unique lincRNAs" is not 
clear. The Authors should re-write it. 

Authors' response: The sentence has been re-written 
according to the reviewer's suggestion 

Section "Background" line 74 - The expression "coin- 
ciding" might be "in agreement with". 

Authors' response: We have deleted the sentence in the 
revised manuscript. 

Section "Background" line 70 - The Authors should 
justify why they selected K562 cells. 

Authors' response: It has been revised according to the 
reviewer's suggestion. 

Section "Background" lines 76-77 - The sentence 
"Therefore, we suggest ... lincRNA stability" is unclear 
and should be re-written. 

Authors' response: We rewrote it in the revised 
manuscript. 

Section "Results and discussion" line 85 - A reference 
to Cufflinks is mandatory. And it is also necessary to 
mention what is that (briefly). 

Authors' response: We have added the cited paper in 
the revised manuscript. 

Section "Results and discussion" line 86 - Probably it 
would be better to write simply "by considering the annota- 
tions present in XXX" or "by using the software XXX" and 
"(see Methods for details)". In its present form and without 
references, this list of resources is not really readable. 

Authors' response: We added the cited paper in the re- 
vised manuscript. 

Section "Results and discussion" lines 90-91 - The 
Authors should justify why their method is more effective 
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than alternative methods based on the integration of 
several databases. In fact, I understand that it is simpler. 
But I am not sure it is more effective. 

Authors' response: We have revised it. 

Section "Results and discussion" lines 103-104 - The 
Authors mention "four public database annotations". 
They should also indicate their names and where they 
can be found. 

Authors' response: It has been revised according to the 
reviewer's suggestion. 

Section "Results and discussion" line 104 - The Authors 
mention "PH" here for the first time: it is necessary to 
define it explicitly. If I am not wrong, it is defined only in 
the Abstract and this is not sufficient. 

Authors' response: PH was defined in the "Methods" 
section. 

Section "Results and discussion" lines 109-113 - The 
Authors describes the"minimum free energy". This should 
be described better and it is necessary to cite the compu- 
tational procedure used to compute this thermodynamic 
quantity. 

Authors' response: We have revised it in the "Methods" 
section according to the reviewer's suggestion. 

Section "Results and discussion" lines 116-120 - These 
sentences repeat the"Background" section and should be 
removed (or moved to the introductory section). 

Authors' response: These sentences been deleted 
according to the reviewer's suggestion in the revised 
manuscript. 

Section "Results and discussion" lines 126-127 - The 
expression "Comparing the both datasets, ..." might be 
inappropriate and might be even removed. 

Authors' response: It has been revised according to the 
reviewer's suggestion. 

Section "Results and discussion" line 128 - RNAfold 
can be used to compute something and it cannot be 
calculated. 

Authors' response: We have deleted the sentence in the 
revised manuscript. 

Section "Results and discussion" line 132 - The verb 
"testified" might be inappropriate. Why not "determined 
experimentally"? 

Authors' response: It has been revised according to the 
reviewer's suggestion. 

Section "Results and discussion" line 134 - The "5 
unique lincRNA" were taken from PH or from ENCODE? 

Authors' response: It has been revised according to the 
reviewer's suggestion. 

Section "Results and discussion" line 136 - A refer- 
ence to "qPCR after ActD treatment" is necessary. 

Authors' response: We have added the cited paper in 
the revised manuscript. 

Section "Results and discussion" lines 137-138 - A 
expression "... which is coordinated ..." might be 



inappropriate. Perhaps it might be changed into "... in 
agreement with 

Authors' response: We have deleted the sentence in the 
revised manuscript. 

Section "Results and discussion" line 143 - The verb 
"express" might be "may be expressed". Section "Results 
and discussion" line 151 - The verb "is" might be 
inserted between "It" and "highly". 

Authors' response: It has been revised according to the 
reviewer's suggestion. 

Section "Results and discussion" lines 150-157 - Caution: 
this discussion is highly speculative and it should be 
re-written. It is necessary to clearly indicate that these 
are mere suppositions of the Authors that are not 
based on any new results. 

Authors' response: We have re-written in the revised 
manuscript. 

Section "Results and discussion" line 164 - The expres- 
sion "FPKM" should be indicated extensively. 

Authors' response: We have re-written the sentence in 
the revised manuscript. 

Section "Methods" line 188 - The verb "were" might 
be "was". 

Authors' response: Thanks, we corrected it in the revised 
version. 

Section "Methods" line 196 - The sentence "Total RNA 
was extracted as described above" might be deleted. 

Authors' response: We have deleted the sentence in the 
revised manuscript. 

Section "Methods" line 231 - "We" might be "we". 

Authors' response: We are sorry about this mistake, 
and have corrected it in the revised version. 

Reference number 8. The journal name should be ab- 
breviated. The same goes for several other references. 

Authors' response: We fully checked the manuscript 
and revised similar question. 

Reference number 34. Volume and pages are missing. 
The same happens also in some other reference. 

Authors' response: We fully checked the manuscript 
and revised similar question. 

In the right part of Figure 1, the box named "Non- 
coding" might be modified by listing the four programs in 
the same order used to describe them in lines 93-100 of 
the section "Results and discussion": iseeRNA, CPC, Phy- 
loCSF, CPAT. In the present figure, CPAT is the second 
program and not the fourth. 

Authors' response: We have modified the description 
about Figure 9 in the "Methods" section. 

The caption of Figure 2 is unclear. It should be re- 
written. 

Authors' response: We have revised it. 

Figure 3 - Which units are used the measure the free 
energy? The data shown in the figure are taken from PH 
or Encode? It is also necessary to write what are the 
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thick horizontal black lines and what the error bars indi- 
cate. In fact, although most of the readers will understand 
this figure, it is mandatory to write an exhaustive legend. 

Authors' response: It has been revised according to the 
reviewer's suggestion. 

In Figure 4, "Encode" should be "ENCODE". 

Authors' response: We are sorry about this mistake, 
and have corrected it in the revised version. 

Figure 5 - See the observations about Figure 3. 

Authors' response: It has been revised according to the 
reviewer's suggestion. 

Figures 6 and 7 - These curves have very different 
shapes. Might the Authors try to describe them and 
write something about the differences? 

Authors' response: It has been revised according to the 
reviewer's suggestion. 

Reviewer #1 (Second Round): Prof Oliviero Carugo, 
University of Vienna, Austria 

The revised version of the manuscript is considerably 
better than the original version. However, in my opinion, 
there is still a very modest direct and experimental evidence 
of the difference between overlapping and unique lincR- 
NAs. As a consequence, the conclusions are extremely 
speculative. 

Authors' response: To explore the stability of two sepa- 
rated classes of lincRNAs by comparing both RNA-Seq 
datasets of K562 cells, we have analyzed minimum free 
energy distribution and measured the half-lives of the se- 
lected lincRNAs. The results supported the conclusion 
that the overlapping lincRNAs show more stable than the 
unique lincRNAs in K562 cell. Actually, the conclusion 
could be accepted in a general consideration that the 
overlapping lincRNAs expressed in both datasets should 
have more probability to be detected than the unique 
lincRNAs only expressed in one single dataset. Therefore, 
the lincRNAs existing in two datasets were likely more 
stable than those existing in one single dataset. 

Reviewer #2 (First Round): Dr Alistair Forrest, Omics 
Science Center, Japan 
Major comments 

1. "The method is simpler and more effective than inte- 
grating several database annotations by their own scripts 
[29,30]." Unless you provide benchmarking this is an 
unsupported statement. Should be removed. 

Authors' response: We did not compare the effective- 
ness without their scripts, however, our pipeline is simpler 
and more convenient than integrating several database 
annotations by the scripts. 

2. "our sequenced RNA-Seq dataset (PH)." Why is the 
dataset called PH? 

Authors' response: We are really grateful to the re- 
viewer's carefulness. A series of RNA-Seq datasets were 



sequenced in K562 cells by PMA or hemin treatment. 
PH was untreated RNA-Seq dataset. 

3. The main point of the paper seems to be that repro- 
ducible IncRNAs have higher free energy and inferred 
secondary structure than IncRNAs only observed in one 
dataset. I think this is likely to just reflect abundance. 
More highly expressed IncRNAs are more likley to be 
observed in multiple datasets. Weakly expressed IncRNAs 
are less likely. The authors could perhaps strengthen their 
story by looking at the relationship between expression 
level and stability by breaking the IncRNAs into several 
bins (low, mid, high) and examine the free energy (and 
half-life) in box plots. 

Authors' response: Thanks for the reviewer's helpful 
suggestions. We have taken this advice and analyzed the 
relation of the minimum free energy and expression level 
in the revised manuscript. However, they were not corre- 
lated, similar to no correlation between IncRNA (includ- 
ing UncRNA) expression and half-life. 

4. "highly expressed in ENCODE, but hardly expressed 
in PH." For these kind of statements it is very important 
to explain what RNA-seq protocol was used for ENCODE 
and PH. If the methods do not match this may explain 
why you see differences. 

Authors' response: We appreciate the reviewer's sugges- 
tion. ENCODE and PH datasets were sequenced following 
the manufacturer's instructions of lllumina. And we ana- 
lyzed them using the same method (see Methods). 

5. "Unstable lincRNAs are very sensitive, respond rapidly 
when transcription changes and act almost immediately 
after transcription without producing a functional gene 
product in the nucleus." Unsupported statement. You do 
not demonstrate that unstable IncRNAs are inherently 
'rapid responders', neither do you demonstrate 'without 
producing a functional gene product'. The manuscript 
should be re-read critically examining whether your state- 
ments are supported by your analysis or from a primary 
reference. 

Authors' response: We added the cited paper and 
revised the sentence. 

6. "Pervasive transcription of lincRNAs in K562 cells". 
This section doesn't really add anything. "514 lincRNAs 
(FPKM > =1) in ENCODE (80 for FPKM > =10), whilst 
there were 312 lincRNAs (FPKM>=1) in PH(30 for 
FPKM > =10). 89 overlapping lincRNAs of both datasets 
expressed with FPKM > =1." This does in no way suggest 
pervasive. 

Authors' response: We really thank for the reviewer's ques- 
tion. We have re-written the section. The distribution and 
expression of UncRNA were showed in the revised Figure 1. 

Minor comments 

1. "LncRNAs, located and transcribed from intergenic 
genomic regions that are proved the importance in cancer 
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by genome-wide studies, are named intergenic IncRNAs 
(lincRNAs)," Do you mean the IncRNAs or the genomic 
regions where they are found are associated with cancer? 
The references 5, 6 do not directly correspond to a cancer 
link. 

Authors' response: We have rewrote the introduction 
and deleted the sentence. 

2. "lincRNA" and "IncRNA" are used interchangably in 
the paper. lincRNA corresponds to 'intergenic', whereas 
IncRNA are just long. You should make a definition and 
stick to one, probably IncRNA. 

Authors' response: LincRNA is a subgroup of IncRNA. 
The features of IncRNA also are showed for lincRNA. 
Some cited papers were associated with IncRNA, however, 
IncRNA included lincRNA. 

3. "It is found that a large proportion of overlapping 
lincRNAs are more stable than unique lincRNAs". I sug- 
gest the authors not use the term 'overlapping lincRNAs' 
as this suggests genomic overlap. I think what the 
authors mean is 'IncRNAs observed in multiple K562 
datasets are more stable than IncRNAs unique to one 
K562 dataset' 

Authors' response: We have annotated overlapping 
lincRNAs and unique lincRNAs in the "Results" section. 
We compared PH and ENCODE datasets to attain over- 
lapping and unique lincRNAs in venn diagram (Figure 2). 
Unique UnRNAs presented in only PH or ENCODE data- 
set, and overlapping lincRNAs presented in both EN- 
CODE and PH datasets. 

4. "In light of this, we acquired a great deal of inter- 
genic transcripts involved possible novel lincRNAs and 
annotated lincRNAs with Ensembl or Gencode". Unclear 
what "transcripts involved possible novel" means. 

Authors' response: It has been revised according to the 
reviewer's suggestion. 

5. "Hexamer usage bais" Bias 

Authors' response: Thanks, we have corrected it in the 
revised version. 

6. "1804 lincRNAs were indentified" Identified 
Authors' response: We are sorry about this mistake, 

and have corrected it in the revised version. 

7. "We randomly testified the half-lives of 7" tested 
Authors' response: We have revised it. 

Reviewer #2 (Second Round): Dr Alistair Forrest, Omics 
Science Center, Japan 
Major points 

1. Ok. The authors have removed the sentence, still the 
section is entitled 132 

"Improved pipeline for lincRNA analysis". I would 
change to just "Pipeline for lincRNA analysis" as still you 
have provided no evidence that the pipeline is 'improved'. 

Authors' response: It has been revised according to the 
reviewer's suggestion. 



2. FIX. The authors must add a sentence at the very 
first use of PH explaining that PH is (PMA or Hemin 
treated K562s). 

Authors' response: We have added the annotation at 
the very first use of PH in the revised manuscript. In this 
study, PH was only named for our RNA-Seq dataset of 
untreated K562 cells. 

3. FIX. Figure 4 should not be split into A and B on 1 
FPKM. Only one set of boxplots should be shown. 

Authors' response: Figure 4 splits into A and B on 1 
FPKM in order to correspond to the minimum free energy 
(A and B in Figure ?>). It can clearly illuminate the cases 
of FPKM > 1 and FPKM < 1. 

4. OK. Actually GSM765405 was sequenced using the 
CSHL RNA-seq protocol not Truseq. However should 
be comparable. 

Authors' response: In this study, we have compared the 
protein-coding RNAs from both datasets by different se- 
quencing library construction methods. We found that 
2546 (87.4%) protein-coding RNAs in PH also presented 
in ENCODE, which showed that the results from CSHL 
RNA-seq protocol and Truseq are comparable. 

5. OK 

6. OK 

New minor comments 

1. line 139 PhloCSF [34]. PhyloCSF 

Authors' response: We have corrected the spelling mis- 
take in the revised version 

2. expressive abundance (FPKM). expression level or 
transcript abundance 

Authors' response: It has been revised according to the 
reviewer's suggestion. 

Reviewer #3 (First Round): Prof Manju Bansal, Indian 
Institute of Science, India 

The authors evaluate the stability of long intergenic 
non-coding RNA (lincRNA) in human K562 cell using 
the RNA-seq data. Two datasets of lincRNA are used in 
the study. The authors developed a pipeline to enriche 
lincRNA compared to that seen in ENCODE dataset. 
The stability of the lincRNA in these two dataset is com- 
pared, using mfe predicted by secondary structure pro- 
gram RNAfold. The reason to carry out this comparison 
is not clearly explained. It is no surprise that non-coding 
regions have lower stability than protein-coding regions. 

Authors' response: The secondary structure of lincRNA 
is important for its stability. We estimated the stabilities of 
overlapping and unique lincRNAs by bioinformatic method 
(minimum free energy). Furthermore, through the analysis 
of minimum free energy, our result agreed with previous re- 
port that lincRNA have lower stability than protein-coding 
RNA. That is, it was verification that the stability of 
lincRNA could be assessed using minimum free energy. 
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Further, the dataset is divided into overlapping and 
unique. Stability study shows that overlapping lincRNAs 
are more stable than unique lincRNAs. The explanation 
for the observed difference is not clearly stated. I cannot 
understand why lincRNA identified in two datasets 
(overlapping) should be more stable. 

Authors' response: We showed that a large proportion 
of overlapping lincRNAs were more stable than the 
unique lincRNAs through the analysis of differences in 
minimum free energy distribution and lincRNA half lives. 

RNA half-life studies are carried out on few lincRNAs 
under each category. The findings of the RNA half-life 
studies, using very few samples (5-7), cannot be general- 
ized for > 700 odd lincRNAs in each group. The conclu- 
sion that lincRNA stability may be related to function is 
already shown in Ref [26,28]. Overall, the conclusions 
made by the authors are rather weak or already 
established. 

Authors' response: Since lots of experiments may need 
too much time, we have added some experiments of 
lincRNA half-lives. We have revised the conclusions and 
suggested that overlapping lincRNAs (relatively stable 
linRNAs) and unique lincRNAs (relatively unstable 
lincRNAs) have different functions. 

The manuscript has several grammatical mistakes, 
starting from the very first sentence in the Introduction 
section. Many paragraphs have loose statements and ir- 
relevant information. E.g. See line 116 to 120. LincRNA 
and IncRNA are used interchangeably in many places. 

Authors' response: We really appreciate the reviewer's 
advice. We have revised the whole paper (including 
introduction) and deleted some irrelevant information. 

The methods section dealing with RNAfold need to be 
elaborated. Figures need to have proper units and labelling, 
e.g. MFE units. 

Authors' response: It has been revised according to the 
reviewer's suggestion. 

Reviewer #3 (Second Round): Prof Manju Bansal, Indian 
Institute of Science, India 

I had requested the author to explain the purpose of divid- 
ing the dataset into Overlapping and Unique. See earlier 
comment "Further, the dataset is divided into overlapping 
and unique. Stability study shows that overlapping 
lincRNAs are more stable than unique lincRNAs. The 
explanation for the observed difference is not clearly 
stated. I cannot understand why lincRNAs identified in 
two datasets (overlapping) should be more stable". The 
author's explanations are not satisfactory and have 
merely reiterated the aim of the work. 

Authors' response: The lincRNAs are considered to 
play very important roles in gene regulation which shows 
dynamic properties in cellular processes. Therefore, iden- 
tification of lincRNA stability is important to annotate 



its functions. In this paper, we classified the lincRNAs into 
overlapping lincRNAs and unique lincRNAs by comparing 
two RNA-Seq datasets of the same K562 cell line with venn 
figure (Figure 2). In general, it is easy to understand that 
the overlapping lincRNAs are more stable than the 
unique lincRNAs, because the expressing probability of 
lincRNA should have high value if it could be observed 
in two separated experiments from ENCODE and our 
own detection in K562 cells. That it, lincRNAs expressed 
in both experiments statistically have higher expressing 
probability than lincRNAs only expressed in one single 
experiment. We have carried out RNA-Seq of K562 cell 
line, compared our RNA-Seq dataset with the corre- 
sponding ENCODE dataset, and found that a large pro- 
portion of protein-coding RNAs (86.7%) of our dataset 
appeared in ENCODE, but relative small proportion of 
lincRNAs (44.1%) of our dataset appeared in ENCODE 
(Figure 7). We speculated this phenomenon arises due to 
the instability of lincRNAs during the cellular processes. 
We classified the overlapping part (overlapping lincR- 
NAs) and the unique part (unique lincRNAs) from two 
datasets, and compared their stabilities. We have proved 
that a large proportion of overlapping lincRNAs were 
more stable than the unique lincRNAs through the ana- 
lysis of differences in minimum free energy distribution 
and lincRNA half-lives. 

The authors postulate that the unique lincRNAs 
present in ENCODE or PH dataset should be function- 
ally different. This implies that the RNAs that are identi- 
fied and annotated as lincRNAs by both ENCODE and 
PH pipeline have similar function and this not true. 

Authors' response: We suggested that overlapping lincR- 
NAs (relatively stable linRNAs) and unique lincRNAs 
(relatively unstable lincRNAs) can be related to different 
functions, because they might be related to different cellu- 
lar processes. 

Further, minimum free energy of RNA is dependent 
on the GC content. A comment on the GC content of 
the overlapping and unique lincRNAs can explain the 
observed difference between the groups (Figure 3). Simi- 
lar observation is warranted for half-life based stability 
analysis. If the GC content of the randomly selected 
lincRNAs in the two groups (overlapping (10 sequence) 
and unique (7 sequence)) is different then the selection 
of lincRNAs is biased by the sequence property. Moreover, 
mRNA expression is known to be affected by GC content. 
(G Kudla - ‎2006, PloS Biology.). Authors can 
check this phenomenon by binning the lincRNAs based 
on GC content and correlated with the level of expression. 

Authors' response: In the previous studies, there is no 
correlation between IncRNA expression and its stability, 
although the signification correlation has been found for 
all RNAs (e.g. Clark et al. Genome Research, 2012, 
22:885-898). 
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PH dataset: full form of the abbreviation is given in 
the end (Methods section). 

But the first mention is in 'Introduction' (line 124). 

Authors' response: We have added the abbreviation of 
PH dataset directly in the introduction section where we 
first mentioned it. 

FPKM: the significance and full form of FPKM is not 
mention (Line 146) 

Authors' response: We have added the full form of 
FPKM in revised version according to the reviewer's 
suggestion. 

Line 82: 'beacause' 

Line 140-141: 'were remained to carry out' 
Line 147: 'annotatied' 

Authors' response: We have corrected the above spelling 
and grammar mistakes in the revised version according to 
the reviewer's instruction. 

Additional files 



Additional file 1: Table SI. The bed file of the unique lincRNAs in PH. 

Additional file 2: Table S2. The bed file of the unique lincRNAs in 
ENCODE. 

Additional file 3: Table S3. The bed file of the overlapping lincRNAs of 
both datasets. 

Additional file 4: Table S4. The result for running Cuffcompare script 
to compare lincRNAs of both datasets. 
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